Goto

Collaborating Authors

 compositional kernel



Reviews: Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity

Neural Information Processing Systems

In fact, there is almost no discussion at all about the implication of the results of the paper. The notions of "computational skeleton" and "realisation of a skeleton" seem relatively interesting and aim at generalising the kernel construction proposed in [13] and [29]. This "construction" is in my view the main contribution of the paper. On a theoretical point of view and from my understanding, the main results of the paper are Thm 3. and 4. However, these results are not "so surprising", since, as confirmed by the authors, they can be interpreted as a kind of "law of large number": the random activation of the network is "counterbalanced" by the replication of the skeleton.


Probing the Compositionality of Intuitive Functions

Neural Information Processing Systems

How do people learn about complex functional structure? Taking inspiration from other areas of cognitive science, we propose that this is accomplished by harnessing compositionality: complex structure is decomposed into simpler building blocks.


Mixed neural network Gaussian processes

Lindo, Alexey, Papamarkou, Theodore, Sagitov, Serik, Stewart, Laura

arXiv.org Machine Learning

This paper makes two contributions. Firstly, it introduces mixed compositional kernels and mixed neural network Gaussian processes (NGGPs). Mixed compositional kernels are generated by composition of probability generating functions (PGFs). A mixed NNGP is a Gaussian process (GP) with a mixed compositional kernel, arising in the infinite-width limit of multilayer perceptrons (MLPs) that have a different activation function for each layer. Secondly, $\theta$ activation functions for neural networks and $\theta$ compositional kernels are introduced by building upon the theory of branching processes, and more specifically upon $\theta$ PGFs. While $\theta$ compositional kernels are recursive, they are expressed in closed form. It is shown that $\theta$ compositional kernels have non-degenerate asymptotic properties under certain conditions. Thus, GPs with $\theta$ compositional kernels do not require non-explicit recursive kernel evaluations and have controllable infinite-depth asymptotic properties. An open research question is whether GPs with $\theta$ compositional kernels are limits of infinitely-wide MLPs with $\theta$ activation functions.


Learning Compositional Sparse Gaussian Processes with a Shrinkage Prior

Tong, Anh, Tran, Toan, Bui, Hung, Choi, Jaesik

arXiv.org Machine Learning

Choosing a proper set of kernel functions is an important problem in learning Gaussian Process (GP) models since each kernel structure has different model complexity and data fitness. Recently, automatic kernel composition methods provide not only accurate prediction but also attractive interpretability through search-based methods. However, existing methods suffer from slow kernel composition learning. To tackle large-scaled data, we propose a new sparse approximate posterior for GPs, MultiSVGP, constructed from groups of inducing points associated with individual additive kernels in compositional kernels. We demonstrate that this approximation provides a better fit to learn compositional kernels given empirical observations. We also theoretically justification on error bound when compared to the traditional sparse GP. In contrast to the search-based approach, we present a novel probabilistic algorithm to learn a kernel composition by handling the sparsity in the kernel selection with Horseshoe prior. We demonstrate that our model can capture characteristics of time series with significant reductions in computational time and have competitive regression performance on real-world data sets.


Mehler's Formula, Branching Process, and Compositional Kernels of Deep Neural Networks

Liang, Tengyuan, Tran-Bach, Hai

arXiv.org Machine Learning

Kernel methods and deep neural networks are arguably two representative methods that achieved the state-of-the-art results in regression and classification tasks. However, unlike the kernel methods where both the statistical and computational aspects of learning have been understood reasonably well, there are still many theoretical puzzles around the generalization, computation and representation aspects of deep neural networks (Zhang et al., 2017). One hopeful direction to resolve some of the puzzles in neural networks is through the lens of kernels (Rahimi and Recht, 2008, 2009; Cho and Saul, 2009; Belkin et al., 2018b). Such a connection can be readily observed in a two-layer infinite-width network with random weights, see the pioneering work by Neal (1996a) and (Rahimi and Recht, 2008, 2009). For deep networks with hierarchical structures and randomly initialized weights, compositional kernels (Daniely et al., 2017b,b) are proposed to rigorously characterize such a connection, with promising empirical performances (Cho and Saul, 2009).


Neural Kernels Without Tangents

Shankar, Vaishaal, Fang, Alex, Guo, Wenshuo, Fridovich-Keil, Sara, Schmidt, Ludwig, Ragan-Kelley, Jonathan, Recht, Benjamin

arXiv.org Machine Learning

We investigate the connections between neural networks and simple building blocks in kernel space. In particular, using well established feature space tools such as direct sum, averaging, and moment lifting, we present an algebra for creating "compositional" kernels from bags of features. We show that these operations correspond to many of the building blocks of "neural tangent kernels (NTK)". Experimentally, we show that there is a correlation in test error between neural network architectures and the associated kernels. We construct a simple neural network architecture using only 3x3 convolutions, 2x2 average pooling, ReLU, and optimized with SGD and MSE loss that achieves 96% accuracy on CIFAR10, and whose corresponding compositional kernel achieves 90% accuracy. We also use our constructions to investigate the relative performance of neural networks, NTKs, and compositional kernels in the small dataset regime. In particular, we find that compositional kernels outperform NTKs and neural networks outperform both kernel methods.


Deep Neural Networks as Gaussian Processes

Lee, Jaehoon, Bahri, Yasaman, Novak, Roman, Schoenholz, Samuel S., Pennington, Jeffrey, Sohl-Dickstein, Jascha

arXiv.org Machine Learning

It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.


Probing the Compositionality of Intuitive Functions

Schulz, Eric, Tenenbaum, Josh, Duvenaud, David K., Speekenbrink, Maarten, Gershman, Samuel J.

Neural Information Processing Systems

How do people learn about complex functional structure? Taking inspiration from other areas of cognitive science, we propose that this is accomplished by harnessing compositionality: complex structure is decomposed into simpler building blocks. We formalize this idea within the framework of Bayesian regression using a grammar over Gaussian process kernels. We show that participants prefer compositional over non-compositional function extrapolations, that samples from the human prior over functions are best described by a compositional model, and that people perceive compositional functions as more predictable than their non-compositional but otherwise similar counterparts. We argue that the compositional nature of intuitive functions is consistent with broad principles of human cognition.


Automatic Generation of Probabilistic Programming from Time Series Data

Tong, Anh, Choi, Jaesik

arXiv.org Machine Learning

Anh Tong and Jaesik Choi Ulsan National Institute of Science and Technology Ulsan, 44919 Korea { anhth,jaesik } @unist.ac.kr Abstract Probabilistic programming languages represent complex data with intermingled models in a few lines of code. Efficient inference algorithms in probabilistic programming languages make possible to build unified frameworks to compute interesting probabilities of various large, real-world problems. When the structure of model is given, constructing a probabilistic program is rather straightforward. Thus, main focus have been to learn the best model parameters and compute marginal probabilities. In this paper, we provide a new perspective to build expressive probabilistic program from continue time series data when the structure of model is not given. The intuition behind of our method is to find a descriptive covariance structure of time series data in nonparametric Gaussian process regression. We report that such descriptive covariance structure efficiently derives a probabilistic programming description accurately.